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ANALYSIS  OF  THE  ORGANIZATION  OF  LEXICAL  MEMORY 


Abstract 

The  practical  outcome  of  the  project,  “Analysis  of  the  Organization  of  Lexical  Memory,” 
is  an  electronic  lexical  database  called  WordNet  that  can  be  incwporated  into  com^tCT  sys¬ 
tems  for  processing  English  text  WordNet  includes  approximately  45,000  lexicalized  con¬ 
cepts,  providing  a  covoage  equivalent  to  a  handheld  dictionary.  The  database  has  three  com- 
ponOTts,  one  each  for  nouns,  vCTbs,  and  adjectives.  The  semantic  relations  that  organize  each 
component  are  different,  but  in  general  a  lexicalized  concept  is  represented  by  a  set  of 
synonyms  that  can  be  used  to  express  the  concept,  and  familiar  semantic  relations  are 
represrated  by  labeled  pointers  between  synonym  sets.  In  order  to  create  the  database,  pro¬ 
grams  were  written  to  write  and  edit  lexical  files,  to  convot  lexical  files  into  a  database,  to 
search  the  database,  to  strip  inflections  firan  search  requests,  and  to  display  reineved  informa¬ 
tion  for  a  user. 

Three  user  interfaces  have  been  developed  for  WordNet.  (1)  The  simplest  is  a  command¬ 
line  v«sion  diat  does  not  require  a  windowing  system  and  can  run  on  stand^  monitors.  (2) 
A  browsw  writttti  for  SunView  and  for  X-1 1  windows  is  intoided  for  use  with  an  on-line  dic¬ 
tionary.  by  lining  WordNet,  the  dictionary  can  be  searched  conceptually  as  well  as  alphabeti¬ 
cally.  (3)  A  lexical  filter  written  for  X-11  windows  catches  unfamiliar  words  in  a  text  file  and 
suggests  alternative  expressions  that  an  author  may  wish  to  choose. 


Background 

The  on-line  database  now  known  as  Word- 
Net  began  as  an  experiment  designed  to  test 
whether  certain  psycholinguistic  claims — namely, 
that  the  organization  of  lexical  memory  can  be 
represented  as  a  network  of  labeled  nodes  (for 
lexicalized  concepts)  connected  by  labeled  arcs 
(for  semantic  relations  between  concepts) — could 
be  extended  to  cover  the  entire  lexical  core  of 
English.  These  claims,  which  can  be  refared  to 
genetically  as  the  relational  hypothesis,  were 
stated  in  the  psycholinguistic  literature  in  very 
g^eral  terms,  but  were  usually  illustrated  with 
only  a  handful  of  carefully  chosen  lexical  items. 
Moreover,  this  relational  hypothesis  contrasted 
with  othCT  psycholinguistic  claims,  which  can  be 
referred  to  genetically  as  the  componendal 
hypothesis,  to  the  effect  that  the  organization  of 
lexical  rnCTiory  is  best  rqwesented  by  analysis 
into  semantic  components,  rather  than  into 
wmantir.  relations.  Fundamental  questions  about 
the  theory  of  lexical  knowledge — such  questions 
as  how  much  of  the  descriptive  load  can  be  car¬ 
ried  by  relations  and  how  much  by 
components — ^were  unanswered.  In  order  to  pur¬ 
sue  such  questions,  therefore,  it  was  decided  to 
push  the  relational  ^proach  as  far  as  it  would 
go — to  apply  it  literally  to  the  entire  substantive 
lexicon  of  English — to  see  where  it  fails  and  to 
discovo*  what  kinds  of  lexical  knowledge  require 
more  st^histicated  analysis. 


The  experiment  can  be  counted  a  success, 
although  a  relational  characterization  of  lexical 
memory  for  all  of  English  could  not  be  imple¬ 
mented  as  directly  as  had  been  anticipated  at  the 
beginning;  a  number  of  unexpected  problems  had 

to  be  resolved  in  order  to  carry  it  through.  An  ini¬ 
tial  decision  was  made  to  limit  the  experiment  to 
semantic  relations  between  open  class  words; 
closed  class  words  (prepositions,  pronouns,  con¬ 
junctions,  articles,  etc.)  are  better  characterized 
by  their  syntactic  propaties  and  relations,  and  for 
practical  plications  in  natural  language  process¬ 
ing  the  closed  class  words  should  be  an  integral 
part  of  the  parsing  program.  But  even  for  opmi 
class  words  there  are  differaices  between  parts  of 
speech  that  a  relational  represaitation  must 
respect:  fw  nouns,  the  relation  of  class  inclusicn 
is  most  important;  for  verbs,  a  complex  set  of 
entailment  relations  is  required;  and  modifies  are 
best  charactCTized  in  terms  of  oppositions.  Con¬ 
sequently,  discovering  what  semantic  relations  to 
use  required  three  concurrent  and  related  investi¬ 
gations,  and  resulted  in  three  relatively  indepen¬ 
dent  networics:  one  each  for  nouns,  verbs,  and 
adjectives. 

Semantic  Relations 

What  tarns  should  a  semantic  relation 
relate?  A  basic  assumption  here  is  that  a  distinc¬ 
tion  must  be  drawn  between  two  common  senses 
of  the  word  “word,”  between  words  as  concrete 
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forms  (strings  of  ASCII  charactCTS  in  this 
instance)  and  words  as  abstract  concepts  that  the 
forms  can  be  used  to  express.  Since  computers 
see  rharariftr  strings  where  people  see  concepts, 
an  important  goal  of  this  woric  was  to  give  com¬ 
puters  something  diat  could  be  processed  as  peo¬ 
ple  process  concepts.  The  initiai  assumption, 
therefore,  was  that  semantic  relations  should  be 
relations  between  lexicalized  concepts. 

A  wide  variety  of  semantic  relations  has 
been  described  in  the  technical  litoature,  but  few 
were  deemed  suitable  for  this  research.  The  cri- 
tCTia  for  adoption  are  simple:  (1)  Since  the  basic 
COTception  is  that  of  a  network,  binary  (two-term) 
semantic  relations  were  presupposed.  (2)  Since 
broad  covaage  of  the  lexicon  is  a  prime  ctm- 
sideration,  semantic  relations  with  a  narrow  range 
of  ^plication  are  neglected  (the  relation  “ances- 
tOT  of,”  for  example,  applies  only  between  kin 
toms).  (3)  Since  the  network  is  intended  fa- 
users  without  special  training  in  linguistics, 
<y.m^nrir.  relations  must  be  intuitively  obvious  to 
laypersons.  (4)  Since  workers  creating  the  data¬ 
base  are  necessarily  dependent  on  standard  lexi¬ 
cographic  references,  semantic  relations  that  ate 
regularly  coded  in  dictionaries  and  thesaunises 
are  preferred.  (5)  Since  exploration  of  the  net¬ 
work  in  any  direction  is  desired,  only  semantic 
relations  that  have  an  obvious  reciprocal  relation 
are  adopted.  A  numba  of  semantic  relations  sa- 
vived  these  criteria. 

The  attempt  to  limit  WordNet  to  semantic 
relatiois  between  lexicalized  concepts  failed;  in 
particular,  synonymy  and  antonymy,  two  basic 
semantic  relations,  hold  between  lexical  forms. 
The  other  semantic  relations,  howeva,  are  rela¬ 
tions  between  lexicalized  concepts. 

Synonymy:  Two  word  forms  are  synonyms  if 
there  are  linguistic  contexts  in  which  one  can  be 
substituted  for  the  other  without  altering  the 
meaning;  “snake”  and  “serpent”  (N,  V,  Adj) 

Antonymy.  Two  word  forms  are  direct  antonyms 
if  one  is  the  conventional  opposite  of  the  other; 
“clean”  and  “dirty.”  (N,  V,  Adj) 
HyponymylHypernymy:  Forms  expressing  con¬ 
cept  A  are  hyponyms  (subordinates,  subsets)  of 
forms  expressing  coicept  B  if  A  is  included  in  B. 
If  is  a  hyponym  of  Fg,  then  Fg  is  a  hyperaym 
(superordinate,  superset)  of  F^;  “A  house  is  a 
Odnd  of)  building.”  (N) 

Troponymy:  Foms  expressing  concqpt  A  are  tro- 
ponyms  of  forms  expressing  concq)t  B  if  A  is  a 
particular  manner  of  doing  B;  “To  march  is  to 


walk  in  a  particular  manner.”  The  reciprocal 
relation  is  also  coded  in  the  database,  but  is  called 
simply  “superordinate.”  (V) 
MeronymylHolonymy:  Forms  expressing  concept 
A  are  maonyms  of  forms  expressing  concept  B  if 
AisapartofB.  If  F^  is  a  meronym  of  Fg,  then 
Fg  is  a  holonym  of  F^.  Three  types  of  part  rela¬ 
tions  are  coded:  (1)  memba  (“The  navigator  is 
part  of  the  crew”);  (2)  material  (“The  p^r  is 
part  of  the  page”);  (3)  componrat  (“The  wing  is 
part  of  the  plane”).  When  the  matmym  type  was 
^inrftTtaifi  it  was  coded  as  a  component  part.  (N) 

Entailment  Forms  expressing  concept  A  entail 
forms  expressing  concept  B  if  the  occurrence  of 
B  is  necessary  for  the  occurrence  of  A,  and  F^ 
and  Fg  are  not  related  by  troponymy;  “To  fail 
entails  trying.”  (V) 

Cause:  A  special  case  of  entailment;  “To  kill  is 
to  cause  to  die.”  (V) 

All  of  these  semantic  relations  hold 
between  wads  or  concepts  in  the  same  syntactic 
category.  Two  additional  semantic  relations —  is 
an  attribute  of’  and  “is  a  function  of’ — have  not 
yet  been  coded.  Both  require  pointers  between 
syntactic  categories:  between  adjectives  and 
nouns  in  the  case  of  attributes;  between  vabs  and 
nouns  in  the  case  of  functions.  It  is  believed  that 
these  relations  can  be  added,  and  that  the  result 
will  be  a  better  simulation  of  lexical  memory  and 
a  more  usefiil  database  for  practical  applications. 

Although  the  relations  listed  above  suffice 
to  account  fa  most  common  word  associations,  at 
least  oie  important  feature  of  lexical  manory  is 
not  C2q)tured  by  a  purely  relational  approach, 
namely,  differences  in  the  familiarity  of  different 
words.  Although  firequaicy  of  occurrence  is  the 
preferred  measure  of  familiarity,  counts  broken 
down  by  part  of  speech  are  not  presently  avail¬ 
able  for  all  of  the  wads  included  in  this  database. 
So  an  alternative  measure  was  adopted.  In  gen¬ 
eral,  the  more  familiar  a  word  is,  the  more  alter¬ 
native  senses  it  has,  so  a  sense  count  was  made 
for  an  on-line  dictionary;  the  results  are  included 
in  the  database  fa  each  word  by  syntactic 
category. 

Finally,  since  selectiotial  restrictions— the 
restrictions  on  noun  [dirases  that  can  serve  as 
cases  (a  arguments)  of  a  verb — are  so  important 
for  syntax,  the  database  includes  33  different  sen¬ 
tence  frames  indicating  the  admissible  syntactic 
structures  fa  each  sense  of  evoy  verb. 
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Implementation 

In  order  to  realize  a  computer  simulation  of 
this  lexical  system,  it  was  necessary  to  have  a 
computer  rqjresentadon  fw  lexicalized  concepts 
as  well  as  lexical  forms.  The  following  assump¬ 
tion,  therefore,  is  basic  to  the  implementation:  a 
lexicalized  concept  can  be  represented  by  a  set  of 
word  forms  that  can  express  that  concept  when 
used  in  appropriate  contexts.  For  example,  the 
set  {case,  lawsuit}  would  represent  a  diffierait 
meaning  of  “case”  than  would  (case,  box,  car¬ 
ton)  or  (case,  patient}.  Such  sets  of  words  are 
called  synonym  sets  or,  briefly,  synsets.  Of 
course,  a  computer  that  is  given  a  synset  does  not 
“understand”  anything,  but  a  human  who  knows 
the  langiiagft  will  recognize  the  intended  mean¬ 
ing.  But  the  computer  should  be  able  to  process  a 
synset  in  a  manner  analogous  to  the  way  people 
process  the  conesponding  concept 

As  work  progressed,  however,  it  was 
discovoed  that  synonyms  are  not  always  avail¬ 
able  to  signal  conceptual  differences  between 
synsets.  TTierefore,  Ae  standard  lexicographic 
method  of  adding  a  defining  gloss  was  adopted  to 
clarify  the  intended  distinctions.  Since  this  resort 
to  definitions  came  relatively  late,  they  are  avail¬ 
able  for  only  about  30%  of  the  synsets.  They  are 
coded  parenthetically  and  can  be  eith^  displayed 
or  suppressed  by  the  interface. 

Given  this  coding  for  synonymy,  other 
semantic  relations  can  be  coded  eitha  by  points 
between  word  forms  or  by  pointers  between  syn¬ 
sets.  For  example,  the  fact  that  “war”  is  an  anto¬ 
nym  of  “peace”  is  coded  [war  !-»  peace],  and 
the  fact  diat  tennis  is  a  kind  of  court  game  is 
coded  (tennis,  lawn_tennis}  >  {court_game}. 
These  s^nantic  relations  are  entered  by  lexical 
coders;  the  reciprocal  relations  are  then  added 
automatically  by  a  program  known  as  the 
“grinder,”  which  converts  lexical  files  into  a  lex¬ 
ical  database. 

Software  developed  in  order  to  implemrat 
this  system  is  written  in  C  and  C-h-  and  includes 
the  following  components: 

Editor.  These  programs  support  the  work  of 
ent^ing  information  into  the  lexical  files.  To 
supplement  the  editor,  there  are  programs  to 
search  and  display  the  contents  of  on-line  dic¬ 
tionaries,  to  verify  the  syntax  of  the  lexical  files, 
to  recast  a  noun  file  in  the  form  of  an  outline,  and 
to  provide  an  archive  to  keq>  track  of  the  files  as 
they  are  edited  and  up-dated. 


Grinder.  This  large  program  turns  the  lexical 
files  into  a  database.  It  first  checks  for  coding 
^Tors  and  requests  corrections.  Then  it  inserts  all 
of  the  reciprocal  semantic  relations  that  cod^ 
nmir,  and  outputs  the  result  as  a  coherent  database 
with  a  unique  identifier  for  every  synset  Finally, 
it  constructs  an  index  of  the  letter  strings,  listing 
all  of  the  synsets  in  which  each  string  appears. 

Search  routines:  A  set  of  routines  accepts 
requests  as  input  and  returns  information 
retrieved  firom  the  database.  A  request  consists  of 
a  lett^  string  and  an  identifiCT  for  the  kind  of 
semantic  relation  that  is  desired. 

Morphology.  The  WordNet  database  contains 
primarily  canonical  word  forms.  That  is  to  say,  it 
contains  information  about  the  singular  “tree” 
but  not  about  the  plural  “trees,”  about  present 
t«ise  “hurl”  but  not  past  tense  “hurled,”  etc. 
For  practical  applications,  therefore,  it  is  neces¬ 
sary  to  have  a  morphology  program  that  will 
transform  these  inflected  forms  into  the  canonical 
forms  contained  in  the  database.  This  program  is 
fairly  convOTtional.  It  contains  an  extensive  list 
of  exceptions — ^words  that  do  not  follow  the  rules 
of  English  morphology.  If  a  requested  character 
string  is  on  this  list,  its  canonical  form  will  be 
used  to  search  the  database.  K  a  charactCT  string 
is  not  on  the  exception  list  and  is  not  in  the  data¬ 
base,  the  program  will  attempt  to  strip  inflections 
firom  it  in  order  to  arrive  at  a  string  that  can  be 
found  in  the  database.  Only  if  these  attempts  fail 
will  the  program  report  that  the  string  is  not  in  the 
database. 

Pomhined  with  search  routines,  this  mor¬ 
phology  jaogram  takes  inflected  inputs  and 
returns  canonical  outputs,  e.g.,  a  request  for 
synonyms  of  “hurled”  will  elicit  “throw.”  A 
more  sophisticated  morphology  program  that  will 
return  inflected  ouqiuts — one  that  will  give 
“threw”  or  “thrown”  as  synonyms  of 
“hurled” — is  under  development  as  part  of  the 
lexical  filter  application  described  below. 

Interface:  SevCTal  interfaces  have  been  created  to 
display  information  that  is  retrieved  for  the  user. 
The  simplest  is  a  command-line  vCTsion  that  can 
be  used  ot  any  monitor.  A  more  elaborate  inter¬ 
face,  using  SunView  (a  windowing  system  owned 
by  Sun  Microsystems,  Inc.)  was  used  for  systems 
development.  And  an  interface  using  the  X-11 
window. system  was  developed  for  general  distri¬ 
bution  with  the  database.  These  interfaces  are 
described  in  more  detail  in  the  section  on  Appli- 
catitms,  below. 
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Man  pages:  For  Unix  systems,  a  set  of  man 
pages  is  available.  A  user  should  look  first  at 
wnintro(l),  which  gives  an  overview  of  the  man 
pages  in  chapter  1  of  the  manual.  They  include 
nverify(l)  to  describe  a  program  that  checks  the 
syntax  of  lexical  files,  grind(l)  to  describe  opera¬ 
tion  of  the  grinder.  wntool(l)  for  the  SunView 
interface,  xwn(l)  for  the  X-11  interface,  and 
wn(l)  for  the  command-line  interface.  Thae  is 
also  wnintro(5),  which  introduces  wninput(5)  fe 
the  syntax  of  the  lexical  input  files  and  wndb(5) 
for  the  syntax  of  the  database  itself. 

Coverage 

The  goal  for  WordNet  was  to  include 
approximately  the  same  vocabulary  that  one 
expects  to  find  in  a  collegiate  dictionary.  Because 
the  format  is  so  different  firom  a  printed  diction¬ 
ary,  however,  numerical  comparisons  cannot  be 
marif.  directly.  Three  different  numbers  are 
needed  to  characterize  the  size  of  WordNet  (1) 
the  number  of  character  strings  (ASCII  strings); 
(2)  the  number  of  synsets;  and  (3)  the  number  of 
unique  string-synset  combinations.  (If  the  same 
string  occurs  in  five  synsets,  it  counts  as  one 
string  but  five  unique  string-synset  combinations, 
i.e.,  each  distinct  sense  of  a  string  is  considered  to 
be  a  different  word.)  These  numbers,  broken 
down  by  syntactic  category,  are  given  in  the  fol¬ 
lowing  table,  whae  the  unique  stting-synset  com¬ 
binations  are  referred  to  simply  as  “Words.” 


Category 

Strings 

Synsets 

Words 

Nouns 

36,114 

28,276 

48,672 

Verbs 

9,699 

6,087 

15,824 

Adjectives 

12,283 

10,620 

23,912 

Total 

58,096 

44,983 

88,408 

Much  of  the  woric  of  creating  WordNet, 
however,  consisted  of  insCTting  pointers  between 
synsets  to  represent  semantic  relations  between 
concepts,  and  the  novelty  and  utility  of  the  system 
depends  <m  these  relations.  The  total  numbers  of 
pointers  for  the  various  semantic  relations  coded 
in  the  database  are  shown  in  the  following  table. 

Category  PointCTS  Definitions 

Nouns  40,087  7,164 

Verbs  10,771  2,562 

Adjectives  13,854 _ 3,962 

Total  64,712  13,688 

This  table  also  gives  the  numbCT  of  synsets  in 
each  syntactic  category  that  have  an  accompany¬ 


ing  parenthetical  defining  phrase. 

Applications 

Although  initially  intended  as  an  experi¬ 
ment,  the  success  of  the  experimait  will  be  tested 
by  the  usefiilness  of  the  resulting  database.  The 
WordNet  Hatahase.  is  available  for  general  use  in 
natural  language  processing  and  is  expected  to 
enrich  the  content  of  a  variety  of  practical  tqjpli- 
cations.  Three  examples  were  developed  under 
this  contract,  two  of  which  (a  command  line  inter¬ 
face  and  a  browser)  were  required  in  ordw  to 
develop  the  database,  and  one  (a  lexical  filter)  is 
intend^  to  assist  writas. 

Command  line:  The  simplest  interface  requires  a 
user  to  tag  the  request  for  information  about  a 
word  with  an  indication  as  to  what  information  is 
requested.  This  interface  can  deal  with 
inflectional  morphology.  For  example,  the  com¬ 
mand  line; 

wn  went  -synsv 

returns  all  synsets  for  the  verb  “go.”  The  com¬ 
mand  with  three  tags: 

wn  fights  -synsn  -synsv  -synsa 
will  elicit  a  report  for  all  synsets  of  “fight”  (in 
this  case,  as  a  noun  and  verb,  but  not  as  an  adjec¬ 
tive).  The  wn  command  without  argum^its  is  a 
request  for  help:  it  produces  a  list  of  all  the  avail¬ 
able  tags  Definitional  glosses  will  not  be  shown 
unless  the  tag  — d  is  inserted  immediately  follow¬ 
ing  the  target  word. 

Although  the  command-line  interface  is 
simple,  some  of  the  commands  are  relatively 
complex.  For  ©cample,  the  tag  -pal In  will  not 
only  return  the  parts  dial  are  directly  coded  as 
parts  of  the  searchword,  but  will  also  list  all  of  the 
parts  that  the  searchword  inhwits  from  its  hyper- 
nyms. 

Browser.  The  interface  used  fra:  developing 
WordNet  was  called  “lecpert”  or  “browser.” 
Initially,  it  was  a  window  in  the  SunView  window 
system;  subsequently  it  was  rewritten  as  an  X-11 
window.  A  target  word  can  be  typed  or  dragged 
to  the  input  slot  to  start  a  search.  If  the  word  is 
found  in  the  database,  buttons  appear  indicating 
that  WoidNet  knows  about  the  word  as  a  noun,  or 
a  verb,  or  an  adjective,  or  scane  combination. 
The  mouse  can  then  be  used  to  expose  a  menu 
that  lists  all  of  the  kinds  of  information  available 
about  that  word.  The  same  search^  are  available 
with  the  browser  that  are  available  with  the 
command-line  interface,  but  commands  that  wiU 
not  yield  information  are  “greyed  out”  on  the 
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menu.  By  selecting  from  the  menn,  a  user  can 
pursue  the  particular  semantic  relation  of  interest. 
For  nouns,  the  usct  may  have  a  choice  among 
synonyms,  antonyms,  hypwiyms,  hypemyms,  or 
meronyms,  or  may  ask  about  the  wmd’s  familiar¬ 
ity.  For  verbs,  the  usct  may  select  from 
synonyms,  antonyms,  supCTordinates,  troponyms, . 
p.ntailTnp.nts,  cause,  familiarity,  or  sentence 
frames.  For  adjectives,  the  user  may  select 
synonyms,  antonyms,  or  familiarity.  When  this 
interface  is  used  to  write  lexical  files,  it  is  used  in 
conjunction  with  on-line  dictionaries.  Thus  it 
becomes  possible  to  search  die  dictionary  concep¬ 
tually,  not  merely  alphabetically. 

Since  inflections  are  stripped  from  input 
requests,  the  browser  can  also  be  used  while  com¬ 
posing  a  text  file — ^words  in  the  text  can  be 
highlighted  with  the  cursor  and  dragged  to  Word- 
Net  The  third  interface  was  an  attempt  to  capi¬ 
talize  on  this  feature. 

Filter.  The  filter  program  is  an  attonpt  to  use 
WordNet  as  part  of  a  writ»’s  assistant  It  is  not 
interactive.  It  takes  a  text  file  as  iiqtut  and  goes 
through  it  word  by  word.  If  a  word  in  the  text  is 
not  found  in  WordNet  it  is  added  to  a  list  in  a  file 
of  “unknown  words.”  Experience  with  the  lexi¬ 
cal  filter  has  shown  that  many  of  the  unknown 
words  are  proper  nouns,  some  are  typographical 
mistakes,  but  some  are  words  that  clearly  should 
be  added  to  the  WordNet  database.  If  a  word  in 
the  text  is  found  in  WordNet  its  familiarity  is 
tested;  if  it  is  familiar,  the  Htsx  does  nothing,  but 
if  it  is  unfamiliar,  the  filter  prints  out  all  of  the 
synsets  in  which  the  word  occurs,  accompanying 
each  word  with  its  familiarity  value.  That  is  to 
say,  an  autiior  is  not  only  told  that  a  word  is 
unfamiliar;  an  attempt  is  ntade  to  suggest  more 
familiar  alternatives. 

In  its  present  form,  the  filter  frequently  sug¬ 
gests  alternatives  that  are  inappropriate.  For 
example,  they  may  be  for  the  wrong  part  of 
speech.  More  often,  even  when  they  are  in  the 
correct  syntactic  category,  they  include  other 
senses  of  the  word.  Since  the  filter  responds  to 
unfamiliar  words  and  unfamiliar  words  are  sel¬ 
dom  ambiguous,  these  probl^s  are  not  severe. 
But  a  simple  parser  (or  “parts”  program)  that 
could  use  the  context  in  order  to  discriminate 
among  noons,  vabs,  and  adjectives  would  elim¬ 
inate  syntactic  confusions.  A  more  intelligent 
system  would  be  required  to  eliminate  semantic 
ambiguity.  For  example,  die  text-critiquing  pro¬ 
gram  being  develqied  by  David  Kieras  at  the 
University  of  NCchigan  is  one  such  intelligent 


system  for  assisting  writers;  Kieras  is  exploring 
the  use  of  the  semantic  information  in  WordNet 
to  enhance  the  capabilities  (rf  that  system.  Other 
opportunities  to  evaluate  WordNet  in  a  testbed 
provided  by  a  language  understanding  system  are 
under  discussion. 

Preliminary  results  thus  confirm  the  cran- 
monsense  conclusion  that  WordNet  is  best  used 
in  conjunction  with  other  components  as  one  part 
of  a  more  powerful  system  for  natural  language 
processing.  The  fact  that  such  marriages  are  pos¬ 
sible,  however,  indicates  that  WordNet  does  pro¬ 
vide  an  effective  combination  of  traditional  lexi¬ 
cographic  information  with  modern  computer 
technology. 

Availability 

Copyright  to  WordNet  is  held  by  Princeton 
University  in  order  to  protect  the  rights  of  the 
developCTS  to  use  their  own  work  and  make  it 
available  to  othCTS,  and  an  application  is  being 
filed  to  protect  the  term  “WordNet”  However, 
an  early  version  has  been  running  on  computers 
at  NPRDC,  and  the  database,  search  code,  mor¬ 
phology  routines,  interface,  and  man  pages  (a  7- 
Mbyte  package,  WordNet  1.0)  are  available  for 
public  distribution.  Inquiries  addressed  to 
wordnet@princeton.edu  should  elicit  information 
about  how  to  obtain  these  materials  via  ftp;  it  is 
hoped  that  the  Lexical  Consortium  at  New  Mex¬ 
ico  State  University  will  distribute  these  materi¬ 
als.  If  d«nand  justifies  it,  it  can  be  made  avail¬ 
able  on  a  cd-rom  disk. 
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